The Quaero Evaluation Initiative on Term Extraction

نویسندگان

  • Thibault Mondary
  • Adeline Nazarenko
  • Haïfa Zargayouna
  • Sabine Barreaux
چکیده

The Quæro program has organized a set of evaluations for terminology extraction systems in 2010 and 2011. Three objectives were targeted in this initiative: the first one was to evaluate the behavior and scalability of term extractors regarding the size of corpora, the second goal was to assess progress between different versions of the same systems, the last one was to measure the influence of corpus type. The protocol used during this initiative was a comparative analysis of 32 runs against a gold standard. Scores were computed using metrics that take into account gradual relevance. Systems produced by Quæro partners and publicly available systems were evaluated on pharmacology corpora composed of European Patents or abstracts of scientific articles, all in English. The gold standard was an unstructured version of the pharmacology thesaurus used by INIST-CNRS for indexing purposes. Most systems scaled with large corpora, contrasted differences were observed between different versions of the same systems and with better results on scientific articles than on patents. During the ongoing adjudication phase domain experts are enriching the thesaurus with terms found by several systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SIBM at CLEF eHealth Evaluation Lab 2016: Extracting Concepts in French Medical Texts with ECMT and CIMIND

This paper presents SIBM’s participation in the Multilingual Information Extraction task 2 of the CLEF eHealth 2016 evaluation initiative which focuses on named entity recognition in French written text. We report on the indexing of the provided QUAERO dataset with multiple knowledge organization systems (KOS) partially or totally translated in French. The extraction method is available online ...

متن کامل

Hybrid Citation Extraction from Patents

The Quaero project organized a set of evaluations of Named Entity recognition systems in 2009, including reference extraction in patent text. The LIMSI participated in this evaluation. The task and its metrics is presented, followed by a complete system description and the evaluation results. The system obtained a (tied) first place in the evaluation.

متن کامل

The 2013 KIT Quaero Speech-to-Text System for French

This paper describes our Speech-to-Text (STT) system for French, which was developed as part of our efforts in the Quaero program for the 2013 evaluation. Our STT system consists of six subsystems which were created by combining multiple complementary sources of pronunciation modeling including graphemes with various feature front-ends based on deep neural networks and tonal features. Both spea...

متن کامل

The Quæro Evaluation Initiative on Term Extraction

The Quæro program has organized a set of evaluations for terminology extraction systems in 2010 and 2011. Three objectives were targeted in this initiative: the first one was to evaluate the behavior and scalability of term extractors regarding the size of corpora, the second goal was to assess progress between different versions of the same systems, the last one was to measure the influence of...

متن کامل

Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign

Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities. In this new definition, the extended named entities we proposed are both hierarchical and compositional. In this paper, we focused on the annotation of a corpus composed of press archives, OCRed fr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012